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The lncP-1 plasmid backbone adapts to different 
host bacterial species and evolves through 
homologous recombination 

Peter Norberg 12 , Maria Bergstrom 1 , Vinay Jethava 3 , Devdatt Dubhashi 3 & Malte Hermansson 1 



Plasmids are important members of the bacterial mobile gene pool, and are among the most 
important contributors to horizontal gene transfer between bacteria. They typically harbour 
a wide spectrum of host beneficial traits, such as antibiotic resistance, inserted into their 
backbones. Although these inserted elements have drawn considerable interest, evolutionary 
information about the plasmid backbones, which encode plasmid related traits, is sparse. Here 
we analyse 25 complete backbone genomes from the broad-host-range lncP-1 plasmid family. 
Phylogenetic analysis reveals seven clades, in which two plasmids that we isolated from a 
marine biofilm represent a novel clade. We also found that homologous recombination is a 
prominent feature of the plasmid backbone evolution. Analysis of genomic signatures indicates 
that the plasmids have adapted to different host bacterial species. Globally circulating lncP-1 
plasmids hence contain mosaic structures of segments derived from several parental plasmids 
that have evolved in, and adapted to, different, phylogenetically very distant host bacterial 
species. 
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The ability of prokaryotes to exchange genes by means of 
horizontal gene transfer (HGT) has far-reaching implications 
for our understanding of prokaryotic evolution 14 . One of 
the most important contributors to HGT is conjugative plasmids, 
which are self- replicating extra- chromosomal units that code for 
their own cell-to-cell conjugal transfer systems. The plasmid back- 
bone, which contains genes encoding plasmid-related traits, such 
as replication control and conjugation functions, is usually loaded 
with accessory genes, such as antibiotic -resistance and heavy- 
metal-resistance genes. These are themselves often part of other 
mobile genetic elements (MGEs), such as transposons and inte- 
grons. Plasmids are important in bacterial evolution and in adapta- 
tion to environmental changes, because they may carry genes that 
are useful to the host bacterium. The resulting fitness of a plasmid 
can therefore be thought of as the sum of a selfish' component, 
including conjugative transfer, replication and various maintenance 
functions, and a component that confers advantages on the host 
cell, exemplified by antibiotic -resistance genes 5 . 

The development of antibiotic resistance in pathogenic bacteria 
is a serious and growing health concern. One particularly prob- 
lematic development is the emergence of multiresistance; that is, 
bacteria becoming resistant to many, if not all, medically used anti- 
biotics. Plasmids have an important role in the spread of antibiotic- 
resistance genes between bacteria and in the development of multi- 
resistance 68 . Knowledge of the manner in which plasmids evolve is 
thus important if we are to better understand the fundamentals of 
prokaryotic evolution and the principles underlying the accumula- 
tion and spread of antibiotic resistance in bacterial communities. 

Research into IncW plasmids 9 and F plasmids 10 has suggested 
recombination, and that rare recombination events may be a driv- 
ing force behind the creation of new plasmid families. The IncP- 1 
plasmid group has a broad host range and can be stably maintained 
in almost all Gram-negative bacteria. IncP-1 plasmids have also 
been demonstrated to conjugate to Gram-positive bacteria 11 and 
to yeast and eukaryotic cell lines 1213 . A recent study using genomic 
signatures also suggested a broad host range of the IncP-1 plas- 
mids 14 . Furthermore, they can also harbour a wide spectrum of 
antibiotic -resistance genes 7 . Five evolutionary clades have hitherto 
been described for IncP-1 plasmids: oc-clade 15 , (3-clade 16 , y-clade 1718 , 
8-clade 17 and e-clade 19 . Several previous studies of the evolution of 
these plasmids focus on differences in MGE incorporated into the 
backbone 20 22 . Incorporation and expelling of such elements occur 
more frequently than do changes in the core backbone, exemplified 
by plasmids with similar backbones, harbouring different trans- 
posons ( 15 > 20 ' 23 ; and the present report), thus providing information 
on the relatively recent evolution of the plasmids. Long-term evolu- 
tion, however, should preferably be based on 'deep characters', and 
analysis of the plasmid backbone may reveal important information 
on how these plasmids evolve and adapt to their hosts. 

Information about recombination of the IncP-1 plasmid back- 
bone has hitherto been sparse, except in a few studies in which occa- 
sional recent recombination events were suggested 19,24 . It has been 
suggested that recent human activities, such as the use of wastewater 
treatment plants that mix bacteria from a large number of sources, 
would increase contacts between bacteria and therefore increase 
recombination between plasmids 7 . Furthermore, the increased 
mobility of people and goods would be expected to increase the 
worldwide spread of these plasmids. Isolation of similar plasmid 
backbone sequences from different parts of the world seems to sup- 
port this hypothesis 19 . 

Here we analysed the complete backbone genomes of 25 IncP- 
1 plasmids, including two novel plasmids from the marine envi- 
ronment. We demonstrate that recombination is not only a recent 
phenomenon induced by human interference but also has been 
a continuous and prominent feature of the IncP-1 backbone 
evolution. Considering recombination, we describe a consensus 



phylogeny of the IncP-1 plasmids presenting a divergence into 
seven distinct clades. We also analysed plasmid DNA signatures 
and suggest that the IncP-1 plasmids have different host species 
histories, and that the plasmids have been temporarily isolated in 
different host bacteria for sufficiently long times for their genomic 
signatures to have been influenced. 

Results 

Plasmid backbone analysis. We analysed the complete backbone 
DNA sequences of two novel IncP-1 plasmids, designated as 
pMCBFl and pMCBF6, isolated from a marine biofilm 25 , and 
compared them with 23 previously described IncP-1 plasmids 
retrieved from GenBank (found through BLAST and literature 
searches). These include the IncP-1 plasmids that resulted from a 
recent thorough plasmid search 14 . Plasmids pMCBFl (62,689 bp) 
and pMCBF6 (66,729 bp) presented identical backbones and 
differed only in their mercury- resistance transposons, the common 
backbone will hereafter be referred to as pMCBFl. Putative gene 
functions are shown in Tables 1 and 2. 

The genetic distance between the amino -acid (A A) sequence 
of each backbone gene in pMCBFl and their corresponding genes 
in the 23 previously described IncP-1 plasmids was estimated by a 
maximum likelihood approach. The backbone gene content in the 
25 plasmids differs significantly and only 24 homologues of the 41 
backbone genes in pMCBFl were present in all analysed plasmids 
(Fig. 1). The AA similarity differed also widely with trbD being the 
most conserved gene. Among all 23 plasmids, plasmid pB4 presents 
the closest genetic distance to pMCBFl in genes trbK, trbL, traG 
and traO, whereas pB4 genes traC2 and traK present the longest 
genetic distance. Similarly, the pKJK5 genes trbB, trbE, trbj, traH, 
traj, klcB and klcA presented the closest, and the two genes upf30.5 
and kleB in the same plasmid presented the longest genetic distance 
to pMCBFl. Only plasmids pAKD4 and pQKH54 did not have any 
gene with the closest genetic distance to pMCBFl. Such alterations 
of relative genetic distances may be explained either by unequal 
nucleotide substitution rates or by an evolutionary history including 
homologous recombination (that is, the fact that the different genes 
in each plasmid backbone have different ancestries). 

To reconstruct their evolutionary history, it was necessary to 
base the phylogenetic analysis on backbone regions, which are con- 
served and present in all 25 plasmids. Three such relatively large 
regions were identified and here referred to as regions A, B and C 
(Fig. 1). Region A was further divided into sub regions A x and A 2 to 
decrease its size. Region A l contains the seven genes trfA, ssb, trbA, 
trbB, trbC, trbD and trbE. Although the AA sequences for the genes 
ssb and trbE in plasmid pEST4011 and pBS228, respectively, was 
not available because of 'truncation by insertion', the counterpart 
of the genes was still present, allowing it to be included for analy- 
sis. Region A 2 contains the seven genes trbE to trbL. Region B con- 
tains the 1 1 genes traE to traO, and region C contains the five genes 
kfrA, korB, korA, incC and kleE. The DNA sequences were aligned 
and gap regions were excluded before further analyses. The four 
regions were also concatenated and analysed as one large (-19,000 
nucleotides) segment. Plasmid pIJBl was previously described as a 
recombinant 26 with a duplication of the genes trfA to trbE. In this 
study, we included the second duplicate in the analysis to analyse 
an intact A region. 

Phylogenetic analysis of the IncP-1 backbone. A splits network 
(Fig. 2a) was initially constructed for 1,000 bootstrap replicates of 
the concatenated segments A v A 2 , B and C of 24 IncP-1 plasmids 
(plasmid pEST4011 was excluded from the analysis as it lacks the 
genes in A 2 ). The network, which presents a combinatorial gene- 
ralization of phylogenetic trees, presented a star-like topology with 
seven main clades. pMCBFl formed a novel clade, hereafter called 
^. As visible in a previous study 26 , the (3-clade 16 could be divided into 
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Table 1 1 Location and putative function of the predicted 
coding regions of pMCBFI. 



Positions 


Gene name* 


Function* 


D 1-315 


trbA 


Mating pair formation (Mpf) regulation 


Pi COO 1CC1 


UDD 


Mpf, ATPase, protein kinase 


U l_>o4-iyoo 


I7DL 


IVlpT 


U 199U-Z J01 


trbD 


Mpf 


U zzyo-4o4l 


trbE 


Mpf 


U 4oDO-DDZU 


rror 


IVlpT 


u DDjy-DDzy 


trbb 


Mpf 


U bboZ-/UZy 


trDn 


IVlpT 


Pi moc /r/inr 


trbl 


Mpf 




trbJ 


Mpf 


P 0001 O /1 1 o 

u yzu i-y4u 


trbK 


Entry exclusion 


D 94zb-lllU/ 


trbL 


Mpf, topoisomerase 


n nn/io 11~71 /i 
U I IU4o-l I / 14 


uulv\ 


IVlpT 


D 11 /iz-lzb4y 


trbN 


Mpf, 


U lz:)4b-loU44 


trbP 


Mpf 


U IZoZZ-U4yb 


upf30.5 


Outer membrane protein 


L Iby/ l-loooU 


orf 77 


Hypotetical prot. 


U IbUUb-l /bo4 




Transposition 


D 17867-18595 


fn/8 


N 1 r binding 


U lobob-iyoUy 




Transposition of Tn5053 


U iyo/U-ZU4o4 


tniR 


Resolvase 


L zUbzi-zlOzU 


orf 22 


Hyp. open reading frame 


L zUbc5/-zU//i 


mere (urf-1) 


Mercury resistance 


L zU/ /U-zlzib 


merD 


Regulation 


L zll5z-zi074 


merA 


Mercury reductase 


r~ TT7AC TIA^ A 

L zz/9b-ziU4L) 


merF 


Mercury transporter 


U zo^b l-z4job 


merR 


Regulatory prot. 


L zJ04b-zbilo 


merP 


Mercury binding 


L zJJJ4-2J684 


merT 


Mercury transport 


L. Zb/bb-z4iyU 


merR 


Regulatory prot. 


U z4 Jz I-Z4V I / 


resA 


Resolvase 


L z4yiy-zb/94 


yacC 


Hyp. prot. with exonuclease domain 


L Zbob4-oUob I 


troLz 


DNA primase 


L iUibb-iU/iz 


traD 


DNA transfer 


L J0757-J2817 


traE 


DNA topoisomerase 


L oZooo-ooboZ 


traF 


Maturation peptidase 


L ooobb-obZ/U 


traG 


DNA transport during transfer 


L obbbo-obyo4 


trari 


Relaxosome stabilization 


L c3bzb/-J/4oU 


tral 


DNA relaxase 


L o/b lo-o/ooy 


traJ 


or/T binding 


U Jozbb-Job/J 


traK 


or/T binding 


p ooiCio onono 
D Job/J-iyi9o 


traL 


Transfer protein, Topoisomerase 


p 00000 0000c 


traM 


Transfer protein 


L oVoo l-4U4yZ 


traN 


Muraminidase 


L 4Ubbl-4U^/o 


upp4.o Kirourj 




L 41 l4b-4loo/ 


orf 4b 


Transcription regulator, LysR family 


L 4 lobZ-4Zy /4 


oprN? 


Multi-drug efflux (MDE) outer 






membrane prot. NodT family 


L 4Zy4b~4b I lo 


oqxB mexF 


MDE transporter 


C A /'TOO /I"7000 

L 4blii-4/iiz 


mexE 


MDE membrane fusion prot. 


/~ /i~7iriir /iioia 

L 47515-478z9 


ispSl 




u 4/941-489/5 


orf 50 


Membrane prot. 


U 48988-5UU9I 


ispbi 


Transposase 


u 49oll-bUzbi 


tnpA 


Transposase 


L 50545-51546 


krfA 


Regulation, transcriptional repressor 


L 5l894-5z9:5l 


korB 


Regulation, transcriptional repressor 


L b lo99-5z/:5o 


korA 


Regulation, transcriptional repressor 


C ETT700 ETOOO/l 

L bz/ii-bioz4 


incC 


Regulation, partition 


C 54107-54433 


kleE 


Stable inheritance 


C 54596-54814 


kleB 


Stable inheritance 


C 54871-55107 


kleA 


Stable inheritance 


C 55241-55498 


korC 


Regulation, transcriptional repressor 


C 55488-56612 


klcB 


Stable inheritance 


C 56840-57652 


istB ? 


ATPase 


C 57642-59135 


orf 63 


Resolvase 


C 59386-59850 


klcA 


Antirestriction system 


C 60971-62410 


trfA 


DNA binding, replication initiation 


C 62179-62523 


ssb 


Single-stranded DNA binding 



Hyp., hypotetical; NTP, nucleoside 5'-triphosphate; prot., protein. 
*By similarity to sequences in GeneBank, nucleotide. 



Table 2 | Location and putative function of the predicted 
coding regions of transposon Tn5058 in pMCBF6. 



Positions 


Gene name* 


Function* 


D 16099-17817 

VJ IUU/7 l/OI/ 




ny|juiciiccii [Jiuiciii 


C 16^11-16790 

V_- IUJ II IU; y\J 




1— |\/n nrot 
1 1 y |j. pi kj 1. 


D 17844-18778 


tniB 


NTP hinrlinc nrntpin 

IN 1 r U 1 1 1 U 1 1 1^ VJ 1 \J LC 1 1 1 


D 18818-1994? 

VJ IOO lu 17/t-Z. 


tniQ 


Tr 3 n c n r\ <z it i r\ n 
1 1 a\ \j\J\Jj\ LIUI I 


D 7000^-70617 


tniR 


Dpcol\/3c:p 
I\CjUI V cjdc 


C 706^6-70976 

v_ Z.UUJU 


tniM 


l\/lr\Hi ihator r\\ trancnncitinn 

IVHJUUICIHJI \J\ LI a 1 IO|J(JO 1 L IUI 1 


r 70997-71778 




l\/lprn ir\/ tr^nQnort 

IVlCIV-Uiy L 1 u 1 1 O |JU 1 L 


C 21225-21626 


merD 


Rppi 1 latinn 


C 21644-224S6 


merB 


Orpannmprn irisl k/a^p 

\-/ 1 gai iui 1 ici V-Ui 101 iya oc 


n 77791-77847 




Rpcn 1 lotion 

l\CgU la LICI I 


C 77S99-77761 




1— |\/n nrnt 

1 1 y yj. pi ui. 


C 22542-22724 


merR 


Regulation 


C 23820-24458 


merBl 


Organomercurial lyase 


C 24439-25192 


merG 


Organomercury resistance 


C 25228-27084 


merA 


Mercury reductase 


C 27125-27436 


merP 


Mercury transport 


C 27613-28017 


merT 


Mercury transport 


D 27701-28270 


MerRl 


Regulation 


C 27827-28189 




Hyp. prot. 


C 27970-28152 




Hyp. prot. 



Hyp., hypotetical; NTP, nucleoside 5'-triphosphate; prot., protein. 
*By similarity to sequences in GeneBank, nucleotide. 



two subclades, (3-1 and (3-2. Parallel edges in the phylogenetic net- 
work indicated, however, conflicting phylogenetic signals, possibly 
resulting from homologous recombination. In particular, in addi- 
tion to plasmid pIJBl, plasmid pAOVO02 was a putative recom- 
binant, not clustering to any of the above-described clades. A second 
network, excluding these two plasmids, was therefore constructed 
for comparison (Fig. 2b). 

Recombination analysis. To investigate whether the conflicting 
phylogenetic signals are caused by homologous recombination 
or homoplasy, we initially used a statistical test, the 0-test, which 
was recently described to yield reliable results for diverged DNA 
sequences 27 . We analysed the complete concatenated segment, as 
well as three regions separately, to analyse the frequency and loca- 
tion of recombination crossovers (segments A l and A 2 were analysed 
as one segment A to decrease bias of multiple testing). To estimate 
the frequency of recombinant plasmids, we also divided the data 
set into six representative subgroups. These subgroups were selected 
on the basis of clade identity to analyse possible recombination 
events within the (3-1 subclade, which harbour enough members 
to perform such analysis, and between the different clades. Because 
all three oc-clade plasmids have identical backbone sequences, and 
because the e, y, 8 and £ clades were represented by single back- 
bones, it was impossible to investigate whether recombination had 
occurred within these clades. Consequently, the (f)-test was applied 
on 28 data sets. After a Bonferroni correction for multiple tests, the 
significance level was set to P = 0.05/28 = 0.002. The results (Table 3) 
indicated strong statistical significance (P < 0.002) for recombination 
in the vast majority of the data sets. There was no statistically signifi- 
cant support for recombination crossovers within the three separate 
segments of the (3-1 subclade plasmids or for the A-segment of the 
data set containing plasmids within subclade (3-2 and pKJK5 or for 
the B-segment of the data set containing pQKH54, pMCBFI, RK2 
and pTP6. However, there was high statistically significant support 
for recombination when the three concatenated segments were 
analysed, indicating that recombination crossovers are located 
between, but not necessarily within, the three investigated regions. 

To further explore and visualize putative recombination cross- 
overs, we used the Bootscan method, which uses a sliding- window 
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p-2 



CO 
CQ 
CL 



O ^- T 



CQ 
Q. 



LO 

DC 



CL 
CQ 

CL 



CL 



oo O 
CD 3 

CL CL 



CD 
CL 



CL 
Q 
< 

CL 



6 



< 

CL 



CM 
O 

o 
> 
o 
< 

CL 



< 

CL 



00 
< 
CL 



CD 

CD O 

CL CL 



C\J 

DC 



CD 
I- 

CL 



00 
C\J 
C\J 
CO 
CD 

CL 



O 
I- 

co 

LU 

CL 



< 

CL 



CC 

CD 

CL 



-L v 

* =; 

o * 

CL CL 



C 

o 
"5) 

CD 

or 



trfA 
ssb 
trbA 
trbB 
trbC 
trbD 
trbE 



0.55 
0.28 
0.23 
0.24 
0.32 
0.05 
0.13 



0.55 
0.28 
0.26 
0.26 
0.27 
0.05 
0.13 



0.61 
0.28 
0.25 
0.26 
0.29 
0.08 
0.13 



0.56 
0.27 
0.23 
0.26 
0.27 
0.05 
0.13 



0.55 
0.28 
0.26 
0.26 
0.27 
0.05 
0.13 



0.55 0.55 

0.28 0.28 

0.25 0.26 

0.26 0.26 

0.27 0.27 

0.05 0.05 

0.13 0.13 



0.55 
0.28 
0.25 
0.26 
0.27 
0.05 
0.13 



0.31 
0.28 
0.26 
0.26 
0.27 
0.05 
0.13 



0.56 
0.28 
0.25 
0.26 
0.27 
0.05 
0.13 



0.56 0.89 

0.33 0.32 

0.23 0.25 

0.25 0.26 

0.32 0.30 

0.05 0.09 

0.13 0.15 



0.58 
0.32 
0.25 
0.27 
0.30 
0.09 
0.16 



0.56 0.56 

0.32 0.32 

0.25 0.25 

0.26 0.26 

0.30 0.30 

0.09 0.09 

0.15 0.15 



0.65 
0.61 
0.42 
0.30 
0.55 
0.15 
0.14 



0.65 
0.61 
0.42 
0.30 
0.55 
0.15 
0.14 



0.65 
0.61 
0.42 
0.30 
0.55 
0.15 



0.41 0.40 



0.50 
0.25 
0.39 
0.15 
0.18 



0.50 
0.25 
0.39 
0.15 
0.72 



0.41 
0.28 
0.50 
0.25 
0.39 
0.15 
0.18 



0.86 0.60 

0.76 0.29 

0.53 0.25 

0.32 0.20 

0.41 0.33 

0.18 0.10 

0.23 0.13 



trbF 

trbG 

trbH 

trbl 

trbJ 

trbK 

trbL 



0.30 
0.26 
0.62 
0.35 
0.33 
0.48 
0.52 



0.29 
0.26 
0.60 
0.36 
0.33 
0.46 
0.51 



0.30 
0.27 
0.61 
0.36 
0.33 
0.48 
0.51 



0.30 
0.26 
0.55 
0.36 
0.28 
0.46 
0.61 



0.29 
0.26 
0.60 
0.36 
0.33 
0.46 
0.51 



0.29 0.29 

0.26 0.26 

0.60 0.60 

0.36 0.36 

0.33 0.33 

0.46 0.46 

0.51 0.51 



0.29 
0.26 
0.60 
0.36 
0.33 
0.46 
0.51 



0.29 
0.26 
0.60 
0.36 
0.33 
0.46 
0.51 



0.29 
0.26 
0.60 
0.36 
0.33 
0.46 
0.51 



0.30 0.24 

0.26 0.22 

0.62 0.57 

0.36 0.34 

0.30 0.24 

1.30 0.45 

0.52 0.44 



0.30 
0.23 
0.60 
0.36 
0.24 
0.44 
0.44 



0.30 0.29 

0.23 0.23 

0.58 0.58 

0.35 0.36 

0.25 0.23 

0.36 0.44 

0.44 0.44 



0.32 
0.31 
0.85 
0.50 
0.38 
1.75 
0.66 



0.32 
0.31 
0.85 
0.50 
0.38 
1.75 
0.66 



0.32 
0.31 
0.85 
0.50 
0.38 
1.75 
0.66 



0.33 
0.37 
0.87 
0.42 
0.37 
0.53 
0.81 



0.29 
0.26 
0.60 
0.36 
0.33 
0.46 
0.51 



0.54 0.32 

0.41 0.23 

1.11 0.60 

0.54 0.35 

0.42 0.22 

1.34 1.34 

0.98 0.83 



trbM 

trbN 

trbP 

upf30J 

Tn 

traC2 
traD 



0.22 
0.34 
0.42 
0.54 

0.75 
0.58 



0.21 
0.34 
0.43 
0.53 

0.75 
0.58 



0.21 
0.34 
0.44 
0.52 

0.75 
0.58 



0.21 
0.34 
0.43 
0.53 

0.59 
0.57 



0.21 
0.34 
0.43 
0.53 

0.75 
0.58 



0.21 0.21 

0.34 0.34 

0.43 0.44 

0.53 — 

0.75 0.75 

0.58 0.58 



0.21 
0.34 
0.43 
0.53 

0.75 
0.58 



0.21 
0.34 



0.75 
0.58 



0.21 
0.34 
0.43 



0.75 
0.58 



0.26 0.32 

0.36 0.36 

0.43 0.40 

— 0.51 



0.68 
0.83 



0.68 
0.75 



0.31 
0.38 
0.43 
0.53 

0.66 
0.73 



0.32 0.34 

0.36 0.38 

0.41 0.41 

0.53 0.53 

0.81 0.66 

0.83 0.73 



0.46 
0.60 
0.56 



0.46 
0.60 
0.57 



0.46 
0.60 
0.57 



0.40 
0.41 
0.62 



0.21 
0.35 
0.43 



— 0.38 

— 0.36 
0.67 0.47 

— 0.56 



0.42 
0.97 



0.42 
0.97 



0.42 
0.97 



0.29 0.50 
0.61 0.61 



— 0.55 
(8.27) 0.70 



0.40 
0.78 



CD 
£ 
O 

a> 
cc 



traE 
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Figure 1 1 Genetic distances between pMCBFI and other fully sequenced IncP-l plasmids. Genetic distances between each gene in pMCBFI and the 
corresponding genes in the other 23 analysed plasmids. The plasmid(s) with the longest distance to pMCBFI is marked in red and the plasmid(s) with 
the shortest distance is marked in blue for respective gene. Genes not present in specific plasmids are marked with '-' and genes that are at least partially 
present but not expressed as proteins, or proteins not annotated in GenBank are marked with '*'. Three genomic regions, A (further divided into two 
subregions and A 2 ), B and C were identified as suitable targets for further phylogenetic and signature analysis as they were present in all plasmids. 



approach, in which a window of a fixed size is moved step -by- step 
through the sequence alignment. In each step a phylogenetic tree 
with bootstrap values for each clade is created. The putative recom- 
binant is selected as the query, and the bootstrap support for each of 
the other plasmids being the one that clusters closest to the query is 
plotted. Recombination crossovers are indicated as sudden changes 
in bootstrap supports. Similarity plots were also constructed using 
a similar sliding-window approach, illustrating the DNA sequence 
similarity between the query and the other sequences. 

The Bootscan and similarity plots support recombination. 
One example is pAOVO02, which showed a pattern consistent 
with recombination between the putative parental plasmids R751, 
pAl and pKJK5 (Fig. 3a). These were also supported as parental 

4 



plasmids by the similarity plot, except for pKJK5, which showed a 
lesser similarity to pAOVO02 than the other two. Another exam- 
ple is pB3, which generally presented the closest evolutionary rela- 
tionship to R751 (Fig. 3b) and a close sequence similarity (>95% 
on average). In a specific pB3 region, however, the Bootscan plot 
indicated a closer evolutionary relationship to pKJK5, even though 
the sequence similarity was only 68-88%. A similar alteration 
in bootstrap support was seen for pBlO (Fig. 3c), which mostly 
showed the closest relationship to R751 except in one region that 
was more related to plasmid pAl, supporting a previous suggestion 
about recombination in pBlO (ref. 24). The SimPlot also indicated a 
generally high similarity of > 95% to R751 and a high similarity to 
pAl in the specific region. Finally, additional SimPlot analyses were 
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Figure 2 | Phylogenetic analysis of the IncP-l plasmid backbone. 

(a) Phylogenetic network based on the concatenated backbone regions 
A, B and C of 25 IncP-l plasmids. The network displays seven main clades, 
including a novel clade containing the two newly sequenced plasmids 
pMCBFI (in bold) and two sub-clades, (3-1 and (3-2, of the previously 
described (3-clade. The putative recent recombinant plasmids plJBI 
and pAOVO02 are marked with red ellipses, (b) Phylogenetic network 
excluding the putative recent recombinant plasmids plJBI and pAOVO02. 

performed to investigate the ancestry of specific recombination 
fragments. For example, plasmids pB3 and pBP136 shared almost 
identical sequences with plasmid R751, except in a few regions in 
which the sequence similarity was significantly less (Fig. 4a). When 
pBP136 (Fig. 4b) and pB3 (Fig. 4c) were compared with all other 
plasmids studied here, none of them presented high similarities 
in these regions for plasmid pBP136 and only plasmid pAOVO02 
showed a high similarity in the specific region of pB3. A BLAST 
search identified no sequence with close similarity to the three 
regions in pBP136. In summary, we find that the 0-test supports 
recombination between IncP-l plasmids and Bootscan, and similar- 
ity plots further illustrate the recombination crossovers. 

Analysis of genomic signatures. Species specificity of a bacterium 
can be determined by examining its genomic signature (nucleotide 



patterns found in its DNA) using different approaches. One such 
approach is the study of genomic compositions of oligomers of dif- 
ferent lengths, so-called DNA words 28 . The basis for a particular 
word frequency rests on a multitude of physico chemical properties, 
such as base stacking energy, propeller twist angle, bendability, posi- 
tion preference and protein deformability, but is also influenced by 
the codon usage and GC contents of the DNA 29 . Once a plasmid 
conjugates to a new host, its signature will ameliorate towards that 
of the host. 

By applying recently developed algorithms 30,31 , we analysed the 
genomic signatures in the plasmid backbones to identify putative 
bacterial hosts. We first created a genomic profile for each of all 
1,047 bacterial complete genomic DNA sequences currently availa- 
ble from GenBank. The genomic signatures in the four segments A v 
A 2 , B and C for each of the 25 plasmids were then matched against 
these profiles. To test for statistical significance, we started by inves- 
tigating whether any of the bacterial species within the genus, which 
contained the best match, had a high probability of being the host. 
If no significance was found on the genus level, we stepped up one 
taxonomic level, testing all members in that specific family. If statis- 
tical significance was still not detected, this procedure was repeated 
until we reached the class level. Thus, the P- value indicates whether 
the signature in a plasmid segment is significantly similar to the 
signatures of the species in that specific genus, family, order or class 
(Fig. 5). 

The majority of the plasmids presented genomic signatures that 
were most similar to those of species within the phylum Proteobac- 
teria (Fig. 5). Most of these matches were also statistically signifi- 
cant already on the genus or family level. Interestingly, all plasmids 
had at least two regions with signatures matching species from at least 
different orders, supporting recombination. In addition, although 
only statistical significant at the class level, the A : segment in plas- 
mid pB3 and all plasmids from the a- and 8- clades, as well as the 
B-segment in the plasmids from the oc-clade, presented a genomic 
signature most similar to that of species from the Coriobacteriales 
order of the distantly related Gram -positive phylum Actinobacteria. 
To further demonstrate recombination, a statistical test for a cross- 
region comparison was also performed. In this test, only the best 
match for a specific segment was compared with the best match for 
the other segments in that plasmid. The results demonstrate statisti- 
cally different signatures between all segments that had a best hit on 
the genus or family level in the above test, which further supports 
recombination between plasmids from different hosts. 

Discussion 

We analysed the complete backbone genomes of 25 IncP-l plasmids 
and demonstrated a divergence into seven distinct phylogenetic 
clades, that recombination is a common feature of the plasmid 
backbone evolution, and an adaptation to different hosts. Evolu- 
tionary studies of IncP-l plasmids are often based on gains and 
losses of transposons and other MGEs 20 22 . In particular, the lack 
of inserted elements was considered to be a sign of ancestry, as in 
plasmid pBP136, which has been suggested to represent the ancient 
ancestor of all IncP-l (3 plasmids 22 . However, as MGE are found 
among plasmids in all described clades, the absence of these may be 
a poor indicator of ancestry of the IncP- 1 group. On the other hand, 
we demonstrate that plasmid pBP136 is likely to be a recombinant 
involved in recent recombination events, including parental plas- 
mids from the (3-1 subclade and a hitherto unknown clade (Fig. 4). 
An alternative view would thus be that pBP136 is a result of a (3-1 
subclade plasmid that has recombined, and exchanged regions, with 
an ancestral plasmid lacking insertions. Whether there exist such 
plasmids without insertions or whether insertions can be entirely 
excised is not yet clear. In any case, frequent insertions and dele- 
tions of MGE indicate the recent evolution of plasmids, but the 
older trajectory of plasmid macroevolution must, as here, be based 
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Table 3 | Statistical significance of recombination using the ^-statistics. 
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Test for statistical significance of recombination within the concatenated region A + B + C as well as in three subregions A (A-, + A 2 ), B and C for all and six subgroups of sequences. Results indicating 
statistical significance (P< 0.002 after a Bonferroni correction for multiple tests) for recombination appear in bold; all other results appear in normal text. 



on events such as the mutation, speciation and recombination of 
the backbone core regions 32 . 

All investigated conjugative plasmids, including IncP-1 plas- 
mids, contain at least one entry exclusion gene 33 , which prohibits 
other plasmids in the same incompatibility family from conju- 
gating to that cell. This exclusion system is believed to confer an 
evolutionary advantage to the plasmid as it frees the plasmid from 
competition at segregation during cell division, and protects the plas- 
mid-bearing cell from too many conjugation events 33,34 . Laboratory 
experiments suggest that surface exclusion systems in F -plasmids 
reduce the conjugation rate 100-300 times, and in IncP-1 plasmids 
this reduction is 10-15 times 7,33 . As our results indicate frequent 
recombination of IncP- 1 plasmids, which requires the presence of 
two plasmids in one cell, the experimental results indicating that 
surface exclusion is leaky are supported by this retrospective study. 
Furthermore, an early study indicates that different IncP-1 plasmids 
can coexist in one cell for at least 50 generations 35 , which may allow 
time for recombination. Recombination can function as a power- 
ful and essential driving force of evolution by deleting deleterious 
mutations 36 , collecting beneficial mutations 37 and increasing the rate 
of adaptation 38,39 . It is tempting to speculate that there is an optimal 
balance between saving the plasmid from competition by incompat- 
ible plasmids and, on the other hand, allowing sporadic mobility 
and recombination with plasmids evolved in other host bacteria. 

The three backbone regions in pBP136, identified in the 
similarity plots, did not present a close similarity to any of the other 
plasmids included in this study (Fig. 4). A BLAST search, which 
did not find any sequences with a high similarity with these three 
regions, suggests that previously undescribed IncP-1 plasmid clades 
exist. It is therefore likely that we have yet seen only a fraction of the 
IncP-1 plasmid diversity. 

No correlation between clade identity and the geographic loca- 
tion of the plasmids was detected by simply comparing isolation site 
with clade identity. For example, the plasmids of the (3-1 subclade 
were isolated from a hospital (London, UK), a wastewater treatment 
plant (Braunschweig, Germany), a herbicide spill (Minnesota, 
USA), industrial sewage (Japan), a mercury- contaminated river 
(Kazakhstan), Australia and a hospital (Japan) 40 . However, in 
addition to this apparent worldwide spread, our DNA signature 
analysis indicates historic isolation of IncP- 1 plasmids in specific 
host bacteria (Fig. 5). Genomic signatures are species specific and 
likely formed by host replication and repair mechanisms 31,4143 , but 
may also be affected by environmental factors 44 . Given sufficient 
residence time, plasmid signatures ameliorate towards that of the 
chromosome 14,28,42 . We analysed the putative plasmid-host his- 
tory by using newly developed algorithms based on DNA words of 
five nucleotides, which were demonstrated to be superior to G + C 
or dinucleotide signals for classifying a sequence according to its 
origin 30,31 . The suggested hosts (Fig. 5) are within groups that are 
known to harbour IncP-1 plasmids 7 . All plasmids, except pMCBFI, 
had at least one segment with a genomic signature most similar to 
those of the Burkholderiales order of the Betaproteobacteria class 



(Fig. 5), signifying the importance of this group as a natural host 
for IncP-1 plasmids 14,41 . The finding that all plasmids had segments 
that clustered with different hosts was also supported by the cross- 
region analysis, which further supports recombination. Thus, IncP- 
1 plasmids are recombinants containing regions in their backbones 
descending from parental plasmids, which have evolved in different 
hosts and/or under different selection pressures for sufficient time 
for these unique genomic signatures to evolve. It is noteworthy that 
with some exceptions the suggested hosts of each segment A v A 2 , 
B and C are similar for most members within each clade, indicat- 
ing that recombination happened early in the clade history and that 
amelioration towards a common DNA signature is slow. In most 
cases, the best signature match of a segment was statistically signi- 
ficant on the genus or family level, indicating specific adaptation 
to a host within that genus or family (Fig. 5). On the other hand, 
in some examples, the signature of the best match was statisti- 
cally significant only on the order or class level. The cross-region 
analysis was also unable to demonstrate a statistically significant 
difference for these regions. Part of the explanation for this low 
statistical significance might be that the latter regions have resided 
in several different hosts and have acquired a mixture of signatures. 
Further development of bioinformatics tools to analyse mixtures 
of signatures may provide interesting information about the host 
history of these plasmids that show low statistically significant 
match to one specific host. 

Overall, mean plasmid dinucleotide 41 and trinucleotide signa- 
tures 14 were used to suggest plasmid hosts. The latter study showed 
that the evolutionary host range of the IncP-1 plasmids was broader 
than the narrow host range of the IncF and IncI plasmids. The hosts 
suggested in this study, for at least one of the segments in each plas- 
mid, were often close to one of the top five host matches suggested 
for the overall, whole plasmid analyses by Suzuki et al. u . However, 
in this study we also demonstrate the significance of homologous 
recombination in the evolution of IncP- 1 plasmids. Segment- wise 
analyses demonstrated that the combination of a broad host range 
and recombination leads to the emergence of recombinant IncP-1 
backbones that contain segments of significantly different host ori- 
gins. For example, for six plasmids, the A : and B segment signatures 
showed a similarity to bacteria within Gram-positive Actinobacteria 
(Fig. 5). Interestingly, a recent report showed that the IncP-1 plas- 
mid pKJK5 can transfer to the Gram-positive Arthrobacter sp. 
strain 108 (also class Actinobacteria) in soil rhizosphere experi- 
ments; this Gram-positive bacterium was in fact the most frequent 
pKJK5 transconjugant 11 . The manner in which conjugation was 
detected showed that the plasmid entered the Gram-positive cell 
and expressed its fluorescence gfp marker gene, but the independ- 
ent replication of the IncP- 1 plasmids was not assessed. It cannot be 
excluded that IncP-1 plasmids were incorporated into the Gram- 
positive chromosome and ameliorated, and later recombined to 
contribute to the present plasmids. 

Haines et al 45 recently demonstrated that the IncP- la plas- 
mid RK2 has a mean G + C content of the backbone of 66.6 mol%, 
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Figure 3 | Bootscan and SimPlot analysis. Analysis of the backbones of plasmids pAOVO02 (a), pB3 (b) and pBIO (c). Each coloured plot 
corresponds to a specific plasmid depicted in the colour shemes to the right. The bootscan plot demonstrates phylogenetic relationship to the 
reference strain, and the SimPlot demonstrates the genetic distances to the reference strain in different parts of the genome. Sudden alterations 
in bootstrap support, illustrated in the Bootscan plots, indicates recombination. Sequence similarity to the reference strains is represented in the 
similarity plots beneath the Bootscan plots. Obvious recombination crossovers are highlighted as dotted lines. High sequence similarity indicates 
recent recombination events. Low sequence similarity indicates ancient recombination events, alternatively recent recombination events involving 
unanalysed plasmids. 



whereas the mean G + C content of pQKH54 (IncP-ly) is only 
56.6 mol%, and suggested that pQKH54 has resided in a host spe- 
cies with a lower G + C content than that of RK2. The mean G + C 
content for our suggested hosts for RK2 is 63% whereas the mean 
G + C for the pQKH54 hosts is 57%, which fits well with the plasmid 
G + C. Moreover, the pKJK5 backbone genes had a 6.3% lower G + C 
ratio than that of R751, and these two plasmids were also suggested 
to have had different host histories 19 . The mean G + C content of 



our suggested hosts of pKJK5 and R751 is 60 and 65%, respectively. 
Thus, earlier speculations on plasmid relationships based on G + C 
content 19,45 can be substantiated by the DNA signature analysis, 
which has more predictive power than the G + C content and we can 
now point to possible hosts. 

Perhaps the most important aspects of the evolution and adapta- 
tion of the IncP-1 backbone to its different bacterial hosts are the 
role of these plasmids in HGT and transportation of AB R genes 7,40,46 , 
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Figure 4 | SimPlot analysis. Similarity plots with plasmids R751, pBP136 and pB3 as reference plasmids. Each coloured plot corresponds to a specific 
plasmid depicted in the colour shemes to the right and demonstrates the genetic distances from each plasmid to the reference strain in different parts of 
the genome. The similarity plot of R751 (a) highlights one putative recombination event in plasmid pB3 and three putative recombination events in plasmid 
pBP136. The similarity plots of these two plasmids (b, c) demonstrate that none of the plasmids included in this study are donors of the recombinant 
regions in pBP136. Instead, other plasmids from clades that were not previously described were probably involved in these recombination events. 



which has major implications for the treatment of human patho- 
gens. Several studies have demonstrated that IncP-1 plasmids can 
spread to 47,48 and be maintained in 40 49 many different bacteria. Our 
DNA signature analysis demonstrates that the IncP-1 plasmids 
have been isolated in, and adapted to, different hosts and/or the 
specific environments the host cells experienced over evolutionary 
time scales, implying a plasmid/host coevolution. Although surface 
exclusion has been known to be leaky 33 and incompatibility does not 
immediately segregate two plasmids 35 , the extent of direct contact 
between plasmids in the IncP family is unclear. The frequent pattern 
of recombination presented here indicates that interactions between 
IncP-1 plasmid backbones could be direct and not limited to inter- 
actions with a third-party MGE. This might be one explanation of 
the high AB R mobility in the IncP-1 family, strongly supporting the 
suggestion of Schluter et aU that IncP-1 plasmids may be viewed 
as one of the most potent vehicles for the spread and accumulation 
of multiantibiotic resistance within and between different bacterial 
communities. 

Methods 

Bacterial strains and plasmids and growth conditions. Pseudomonas putida 
UWC1 containing the previously exogenous isolated plasmids pMCBFI and 
pMCBF6 (ref. 25) were grown overnight at 26 °C in Luria-Bertani medium 50 with 
10 g of added NaCll -1 and supplemented with 17mgl _1 of HgCl 2 . Escherichia coli 
were grown overnight at 37 °C in the same medium but supplemented with 
50mgl -1 of ampicillin. 

Molecular techniques. Plasmid DNA was obtained using QIAGEN MIDI preps, 
according to the manufacturer's recommendations (QIAGEN). Shearing of DNA 
to create a plasmid library was carried out by sonication for 30 s (Branson 1510 
sonicator). Sticky ends were filled with Klenow fragments according to the 
manufacturer's recommendations (MBI Fermenta). Sheared plasmid DNA was 
subcloned into the Smal site of pBluescript II SK+ (Stratagene) by blunt-end 
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ligation, and transformed by heat shock (42 °C, 2 min 30 s) into E. coliXL-1 Blue 
(Stratagene). Transformants were picked by blue- white selection; plasmid vectors 
were isolated and screened for inserts by cutting with restriction enzymes, and 
analysed on standard agarose gels. Vectors with positive inserts were used as 
templates in sequencing reactions. 

Sequencing. The DNA sequences from the inserts were obtained by using 
Ml 3 forward and reverse primers from the pBluescript II SK+ and the 
ABI BigDye Terminator Cycle Sequencing kit (Applied Biosystems). Sequencing 
was carried out at KI Seq, CGR Sweden, on an ABI 373 automated DNA sequencer 
(Perkin- Elmer Applied Biosystems). DNA sequences were compiled using 
Contig Express from the Vector NTI Suite 6.0 (Informax). To close gaps in the 
sequence, internal custom primers (Invitrogen) were designed. To close gaps 
and confirm the sequence of the two plasmids, pMCBFI and pMCBF6 were 
also sequenced by MWG Biotech AG (Ebersberg; www.mwg-biotech.com) in a 
'publication quality' DNA sequencing project, as described by MWG (both strands 
sequenced and a final data accuracy of > 99.995%). Sequences of pMCBFI and 
pMCBF6 were deposited in GenBank; Nucleotide Core (accession # AY950444 
and EF 1075 16). 

DNA and AA sequence analysis. DNA and AA sequences were aligned by 
using ClustalW included in the BioX program. Genetic distances were calculated 
using the protdist program included in the phylip package (phylip 3.66), using 
the Jones-Taylor- Thornton matrix. Gap regions were not eliminated before this 
analysis as the program itself drops those regions in affected comparisons. All 
gap regions were, however, removed from the DNA sequence alignment before 
the phylogenetic analysis. Phylogenetic network analysis and the ^-statistics were 
carried out using the SplitsTree program 51 . The splits network (neighbour net) was 
constructed using the uncorrected P character transformation, which computes 
the proportion of positions at which two sequences differ, and the bootstrap values 
were derived from 1,000 bootstrap replicates. The SimPlot and Bootscan analyses 
were performed by using the SimPlot program 52 , with a window size of 200 and 
20 bp steps. 

All analyses of genomic signatures were based on single intact genomic seg- 
ments (that is, without alignment and truncation of gap regions). The analysis was 
carried out by using the program PSTk-Classifier 30 ' 31 , with a fixed-order Markov 
model of order 4 (that is, using a word size of five nucleotides). Profiles were first 
constructed for each of all 1,047 bacterial complete genome sequences currently 
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Figure 5 | Analysis of genomic signatures to identify putative hosts. A signature profile was created according to the word frequency for each of the 
available 1,047 complete genomic bacterial DNA sequences. Further, segments A 1; A 2 , B and C of each plasmid were tested independently against these 
profiles. A P-value indicating the statistical significance was also calculated and indicated for each best match together with the taxonomic level for which 
the significance was achieved. The background colours in the table demonstrate the order that the putative hosts belong to, and the specific host species 
are denoted as colour-coded abbreviations. 



available from GenBank. All four segments A v A 2 , B and C in each of the 25 
analysed plasmids were then separately matched against these profiles. The Markov 
classifier determines a score for a bacterium to be the host for a given plasmid. 
In this way, we can rank various putative host bacteria for a given plasmid. We 
apply statistical techniques for assessing confidence in our predictions that the 
top-ranked candidate is the most likely host bacterium: First, we form a list A 
of the bacteria that are within 5% of the top score. Next, we form a list B of the 
top-ranked candidate and its closely related neighbours in the Entrez taxonomy 
database (http://www.ncbi.nlm.nih.gov/taxonomy). For this, we traverse the 
taxonomy up a fixed number of levels and collect all the bacteria that appear 
below that level. Next we remove from A, those bacteria that also appear in B. 
Now, our question can be precisely reformulated as follows: Is there a significant 
difference in scores between the putative hosts in the lists A and B? The null 
hypothesis is that there is no significant difference, the alternative hypothesis is 
that there are significantly higher scores in list B. Note that this kind of analysis 
does not apply to a single putative host but to distinguish two sets of potential 
hosts. This is required to gain statistical power. In particular, it would assign 
significance to one taxonomically closely related group of bacteria as being the 
host as against all the others. We start our analysis on the genus level; that is, we 
analyse whether the best match is significantly different from the top 5% matches 
to host bacterial species outside the genus to which the best match belong. If no 
statistical significance was achieved on the genus level, we moved up one level 
at a time until the class level was reached. 



We applied the Mann- Whitney test 53 , a powerful non-parametric statistical 
test to identify whether two samples of observations have equally large values. 
It computes a test statistic based on the ranks of the elements in a joint series 
constructed from the two series. The Mann- Whitney test yields a P-value corres- 
ponding to observing a result as extreme as observed series under the null hypothesis. 
There are several reasons to prefer the Mann- Whitney test in our application to other 
well-known tests, such as the Students f-test: First, it is non-parametric, so it does not 
assume a fixed underlying distribution such as the Normal distribution, which para- 
metric tests such as the Students f-test do. It is also tailored for ordinal values; that is, 
the important aspect is the relative order of the data, not their absolute values. This 
is precisely what we are interested in: the ranks of various bacteria as putative hosts. 
Furthermore, it is more robust to outliers and hence less likely to assign spurious sig- 
nificance to such data. Finally, it is significantly more efficient than the Student's f-test, 
especially when the underlying distribution is far away from normal. 

Another question of interest is whether homologous recombination has created 
plasmids containing genomic segments, which have evolved in, and adapted to, 
different host bacterial species. As a complement to the test described above, we per- 
formed a cross-region comparison. We compare the best match obtained for each 
region, and its related neighbours in the hierarchy, against how it compares against 
the other regions. The null hypothesis is that two regions in a plasmid have evolved 
in the same host. The alternative hypothesis is that different regions have evolved in 
different hosts. This test is similar to the test described above with the difference that 
here we test the best matches against each other irrespective of the top 5% matches. 
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